Mathematical Foundation for Semistructured Data

نویسنده

  • Scott Uk-Jin Lee
چکیده

The rapid growth of the World Wide Web and its technologies has resulted in enormous amounts of data being used over the Internet by Web Services and Web-based applications. The increase in semistructured data usage is not limited to Web applications but expands into various other applications such as digital libraries, biological databases and multimedia data management systems. This expansion of semistructured data usage creates the need for effective and efficient utilization of semistructured data [15]. With such a rapid increase in its usage semistructured data needs to be stored, manipulated, and queried to be utilized properly by various applications and tools. For these purposes, many researchers have proposed to design and develop adequate database systems for semistructured data. As a result, several database systems have already been developed for eXtensible Markup Language (XML) [6], which is a common representation for semistructured data, while traditional database companies, such as Oracle, have provided XML support for their existing database systems. As with widely used database systems, various operations that transform the schema have been adopted by the database systems developed for semistructured data to provide effective and efficient data storage and utilization. These schema transforming operations are often performed using algorithms developed specifically for semistructured data storage. The schema transforming operation guided by the algorithms must perform correctly to ensure the consistency of the data and ensure no information is lost. Although the algorithms claim to maintain the lossless and dependency preserving properties, the database systems developed for semistructured data lack verification support to prove the correctness of the transformations. In widely adopted database systems, one of the features that is used to prove the correctness of the operations and algorithms is the mathematical foundation. For example, in relational database systems, a mathematical foundation has been extraordinarily useful in the definition of normalization, to prove that lossless and dependency proving algorithms can be defined. Also a mathematical foundation has been defined to capture object oriented concepts, and used to reason about the correctness of query results in object oriented database systems. Such verification support for operations and the algorithms of database systems ensures the correctness of data manipulations, making the mathematical foundation essential. However, current developments of database systems that store semistructured data lack a mathematical foundation. When there is no general formal way of distinguishing between correct and incorrect transformations, incorrect data transformation can be introduced resulting in unreliable or even corrupt data. Without establishing a well defined mathematical foundation, many limitations will be imposed on the functionality of the database systems for semistructured data making it not as effective and reliable as it should be. Therefore this research proposes to establish a well defined mathematical foundation for semistructured data in order to address this problem. The derived mathematical foundation will verify whether operations and algorithms that transform the schema of semistructured data maintain the lossless and dependency preserving properties.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TQL: a query language for semistructured data based on the ambient logic

The ambient logic is a modal logic proposed to describe the structural and computational properties of distributed and mobile computation. The structural part of the ambient logic is, essentially, a logic of labeled trees, hence it turns out to be a good foundation for query languages for semistructured data, much in the same way as first order logic is a fitting foundation for relational query...

متن کامل

Proximity Determination and its Optimization for Semistructured Data

Proximity queries have been shown to be very useful for semistructured databases in many applications. However, it is challenging to determine proximity even for semistructured database of moderate size. This paper first summarizes our recent proposal for proximity determination of semistructured data. We then present the optimization techniques to scale this proposed methodology to deal with v...

متن کامل

Semistructured Data Store Mapping with XML and Its Reconstruction

XML has been quickly emerging as a dominant standard for data representation and exchange on the World Wide Web for its many good features such as well-formed structure or semantic support. Research on semistructured data over the last several years has focused on data models, query languages, and systems where the database is modeled in some form of a labeled, directed graph. Processing this a...

متن کامل

Spatial Tree Logics to reason about Semistructured Data

The Ambient Logic is a modal logic proposed to describe the structural and computational properties of distributed and mobile computations. The static part of the Ambient Logic is, essentially, a spatial logic for unordered labeled trees, hence it turns out to be a good foundation for expressing properties of tree-shaped data (i.e. semistructured data and XML). The Tree Query Language (TQL) is ...

متن کامل

A Query Language Based on the Ambient Logic

The ambient logic is a modal logic proposed to describe the structural and computational properties of distributed and mobile com putation The structural part of the ambient logic is essentially a logic of labeled trees hence it turns out to be a good foundation for query languages for semistructured data much in the same way as rst order logic is a tting foundation for relational query languag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006